In this paper we aim to group visual correspondences in order to detect objects or parts of objects commonly appearing in a pair of images. We first extract visual keypoints from images and establish initial point correspondences between two images by comparing their descriptors. Our method is based on two types of graphs, named relational graphs and correspondence graphs. A relational graph of a point is constructed by thresholding geometric and topological distances between the point and its neighboring points. A threshold value of a geometric distance is determined according to the scale of each keypoint, and a topological distance is defined as the shortest path on a Delaunay triangulation built from keypoints. We also construct a correspondence graph whose nodes represent two pairs of matched points or correspondences and edges connect consistent correspondences. Two correspondences are consistent with each other if they meet the local consistency induced by their relational graphs. The consistent neighborhoods should represent an object or a part of an object contained in a pair of images. The enumeration of maximal cliques of a correspondence graph results in groups of keypoint pairs which therefore involve common objects or parts of objects. We apply our method to common visual pattern detection, object detection, and object recognition. Quantitative experimental results demonstrate that our method is comparable to or better than other methods.
Huiyun JING Qi HAN Xin HE Xiamu NIU
We propose a novel threshold-free salient object detection approach which integrates both saliency density and edge response. The salient object with a well-defined boundary can be automatically detected by our approach. Saliency density and edge response maximization is used as the quality function to direct the salient object discovery. The global optimal window containing a salient object is efficiently located through the proposed saliency density and edge response based branch-and-bound search. To extract the salient object with a well-defined boundary, the GrabCut method is applied, initialized by the located window. Experimental results show that our approach outperforms the methods only using saliency or edge response and achieves a comparable performance with the best state-of-the-art method, while being without any threshold or multiple iterations of GrabCut.
Kosuke MIZUNO Kenta TAKAGI Yosuke TERACHI Shintaro IZUMI Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper describes a Histogram of Oriented Gradients (HOG) feature extraction accelerator that features a VLSI-oriented HOG algorithm with early classification in Support Vector Machine (SVM) classification, dual core architecture for parallel feature extraction and multiple object detection, and detection-window-size scalable architecture with reconfigurable MAC array for processing objects of several shapes. To achieve low-power consumption for mobile applications, early classification reduces the amount of computations in SVM classification efficiently with no accuracy degradation. The dual core architecture enables parallel feature extraction in one frame for high-speed or low-power computing and detection of multiple objects simultaneously with low power consumption by HOG feature sharing. Objects of several shapes, a vertically long object, a horizontally long object, and a square object, can be detected because of cooperation between the two cores. The proposed methods provide processing capability for HDTV resolution video (19201080 pixels) at 30 frames per second (fps). The test chip, which has been fabricated using 65 nm CMOS technology, occupies 4.22.1 mm2 containing 502 Kgates and 1.22 Mbit on-chip SRAMs. The simulated data show 99.5 mW power consumption at 42.9 MHz and 1.1 V.
Xin HE Huiyun JING Qi HAN Xiamu NIU
Existing salient object detection methods either simply use a threshold to detect desired salient objects from saliency map or search the most promising rectangular window covering salient objects on the saliency map. There are two problems in the existing methods: 1) The performance of threshold-dependent methods depends on a threshold selection and it is difficult to select an appropriate threshold value. 2) The rectangular window not only covers the salient object but also contains background pixels, which leads to imprecise salient object detection. For solving these problems, a novel saliency threshold-free method for detecting the salient object with a well-defined boundary is proposed in this paper. We propose a novel window search algorithm to locate a rectangular window on our saliency map, which contains as many as possible pixels belonging the salient object and as few as possible background pixels. Once the window is determined, GrabCut is applied to extract salient object with a well-defined boundary. Compared with existing methods, our approach doesn't need any threshold to binarize the saliency map and additional operations. Experimental results show that our approach outperforms 4 state-of-the-art salient object detection methods, yielding higher precision and better F-Measure.
We propose a motion detection model, which is suitable for higher speed operation than the video rate, inspired by the neuronal propagation in the hippocampus in the brain. The model detects motion of edges, which are extracted from monocular image sequences, on specified 2D maps without image matching. We introduce gating units into a CA3-CA1 model, where CA3 and CA1 are the names of hippocampal regions. We use the function of gating units to reduce mismatching for applying our model in complicated situations. We also propose a map-division method to achieve accurate detection. We have evaluated the performance of the proposed model by using artificial and real image sequences. The results show that the proposed model can run up to 1.0 ms/frame if using a resolution of 6460 units division of 320240 pixels image. The detection rate of moving edges is achieved about 99% under a complicated situation. We have also verified that the proposed model can achieve accurate detection of approaching objects at high frame rate (>100 fps), which is better than conventional models, provided we can obtain accurate positions of image features and filter out the origins of false positive results in the post-processing.
Dipankar DAS Yoshinori KOBAYASHI Yoshinori KUNO
The detection of object categories with large variations in appearance is a fundamental problem in computer vision. The appearance of object categories can change due to intra-class variations, background clutter, and changes in viewpoint and illumination. For object categories with large appearance changes, some kind of sub-categorization based approach is necessary. This paper proposes a sub-category optimization approach that automatically divides an object category into an appropriate number of sub-categories based on appearance variations. Instead of using predefined intra-category sub-categorization based on domain knowledge or validation datasets, we divide the sample space by unsupervised clustering using discriminative image features. We then use a cluster performance analysis (CPA) algorithm to verify the performance of the unsupervised approach. The CPA algorithm uses two performance metrics to determine the optimal number of sub-categories per object category. Furthermore, we employ the optimal sub-category representation as the basis and a supervised multi-category detection system with χ2 merging kernel function to efficiently detect and localize object categories within an image. Extensive experimental results are shown using a standard and the authors' own databases. The comparison results reveal that our approach outperforms the state-of-the-art methods.
Jonghyun PARK Wanhyun CHO Gueesang LEE Soonyoung PARK
This paper proposes a novel image segmentation method based on Clausius entropy and adaptive Gaussian mixture model for detecting moving objects in a complex environment. The results suggest that the proposed method performs better than existing methods in extracting the foreground in various video sequences composed of multiple objects, lighting reflections, and background clutter.
Ayaka YAMAMOTO Yoshio IWAI Hiroshi ISHIGURO
Background subtraction is widely used in detecting moving objects; however, changing illumination conditions, color similarity, and real-time performance remain important problems. In this paper, we introduce a sequential method for adaptively estimating background components using Kalman filters, and a novel method for detecting objects using margined sign correlation (MSC). By applying MSC to our adaptive background model, the proposed system can perform object detection robustly and accurately. The proposed method is suitable for implementation on a graphics processing unit (GPU) and as such, the system realizes real-time performance efficiently. Experimental results demonstrate the performance of the proposed system.
Dipankar DAS Yoshinori KOBAYASHI Yoshinori KUNO
This paper proposes an integrated approach to simultaneous detection and localization of multiple object categories using both generative and discriminative models. Our approach consists of first generating a set of hypotheses for each object category using a generative model (pLSA) with a bag of visual words representing each object. Based on the variation of objects within a category, the pLSA model automatically fits to an optimal number of topics. Then, the discriminative part verifies each hypothesis using a multi-class SVM classifier with merging features that combines spatial shape and appearance of an object. In the post-processing stage, environmental context information along with the probabilistic output of the SVM classifier is used to improve the overall performance of the system. Our integrated approach with merging features and context information allows reliable detection and localization of various object categories in the same image. The performance of the proposed framework is evaluated on the various standards (MIT-CSAIL, UIUC, TUD etc.) and the authors' own datasets. In experiments we achieved superior results to some state of the art methods over a number of standard datasets. An extensive experimental evaluation on up to ten diverse object categories over thousands of images demonstrates that our system works for detecting and localizing multiple objects within an image in the presence of cluttered background, substantial occlusion, and significant scale changes.
Akinori HIDAKA Kenji NISHIDA Takio KURITA
In this paper, we propose a novel classifier-based object tracker. Our tracker is the combination of Rectangle Feature (RF) based detector [17],[18] and optical-flow based tracking method [1]. We show that the gradient of extended RFs can be calculated rapidly by using Integral Image method. The proposed tracker was tested on real video sequences. We applied our tracker for face tracking and car tracking experiments. Our tracker worked over 100 fps while maintaining comparable accuracy to RF based detector. Our tracking routine that does not contain image I/O processing can be performed about 500 to 2,500 fps with sufficient tracking accuracy.
Daisuke ABE Eigo SEGAWA Osafumi NAKAYAMA Morito SHIOHARA Shigeru SASAKI Nobuyuki SUGANO Hajime KANNO
In this paper, we present a robust small-object detection method, which we call "Frequency Pattern Emphasis Subtraction (FPES)", for wide-area surveillance such as that of harbors, rivers, and plant premises. For achieving robust detection under changes in environmental conditions, such as illuminance level, weather, and camera vibration, our method distinguishes target objects from background and noise based on the differences in frequency components between them. The evaluation results demonstrate that our method detected more than 95% of target objects in the images of large surveillance areas ranging from 30-75 meters at their center.
A multi-stage approach -- which is fast, robust and easy to train -- for a face-detection system is proposed. Motivated by the work of Viola and Jones [1], this approach uses a cascade of classifiers to yield a coarse-to-fine strategy to reduce significantly detection time while maintaining a high detection rate. However, it is distinguished from previous work by two features. First, a new stage has been added to detect face candidate regions more quickly by using a larger window size and larger moving step size. Second, support vector machine (SVM) classifiers are used instead of AdaBoost classifiers in the last stage, and Haar wavelet features selected by the previous stage are reused for the SVM classifiers robustly and efficiently. By combining AdaBoost and SVM classifiers, the final system can achieve both fast and robust detection because most non-face patterns are rejected quickly in earlier layers, while only a small number of promising face patterns are classified robustly in later layers. The proposed multi-stage-based system has been shown to run faster than the original AdaBoost-based system while maintaining comparable accuracy.
Osamu NOMURA Takashi MORIE Keisuke KOREKADO Teppei NAKANO Masakazu MATSUGU Atsushi IWATA
Real-time object detection or recognition technology becomes more important for various intelligent vision systems. Processing models for object detection or recognition from natural images should tolerate pattern deformations and pattern position shifts. The hierarchical convolutional neural networks are considered as a promising model for robust object detection/recognition. This model requires huge computational power for a large number of multiply-and-accumulation operations. In order to apply this model to robot vision or various intelligent real-time vision systems, its LSI implementation is essential. This paper proposes a new algorithm for reducing multiply-and-accumulation operation by sorting neuron outputs by magnitude. We also propose an LSI architecture based on this algorithm. As a proof of concept for our LSI architecture, we have designed, fabricated and tested two test LSIs: a sorting LSI and an image-filtering LSI. The sorting LSI is designed based on the content addressable memory (CAM) circuit technology. The image-filtering LSI is designed for parallel processing by analog circuit array based on the merged/mixed analog-digital approach. We have verified the validity of our LSI architecture by measuring the LSIs.
Hironobu FUJIYOSHI Takeo KANADE
This paper describes a method for detecting multiple overlapping objects from a real-time video stream. Layered detection is based on two processes: pixel analysis and region analysis. Pixel analysis determines whether a pixel is stationary or transient by observing its intensity over time. Region analysis detects stationary regions of stationary pixels corresponding to stopped objects. These regions are registered as layers on the background image, and thus new moving objects passing through these layers can be detected. An important aspect of this work derives from the observation that legitimately moving objects in a scene tend to cause much faster intensity transitions than changes due to lighting, meteorological, and diurnal effects. The resulting system robustly detects objects at an outdoor surveillance site. For 8 hours of video evaluation, a detection rate of 92% was measured, which is higher than traditional background subtraction methods.
In this paper, we explore the possibility of applying associative memories for locating frontal views of human faces in complex scenes. An appealing property of the associative-memory-based face detection system is that learning of the associative memory may be achieved by using a simple Hebbian learning rule. In addition, a simple heuristic rule is used to quickly filter a certain amount of nonface images at the very beginning of the whole detection procedure. By using the rule, we won't waste unnecessary computational resources on those nonface images. A database consisting of 74 images was used to test the performance of our associative-memory-based human face detection system.
Naoya OHTA Kenichi KANATANI Kazuhiro KIMURA
We show that moving objects can be detected from optical flow without using any knowledge about the magnitude of the noise in the flow or any thresholds to be adjusted empirically. The underlying principle is viewing a particular interpretation about the flow as a geometric model and comparing the relative "goodness" of candidate models measured by the geometric AIC.